sensitivity value
A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning
As deep learning models expand, the pre-training-fine-tuning paradigm has become the standard approach for handling various downstream tasks. However, shared parameters can lead to diminished performance when dealing with complex datasets involving multiple tasks. While introducing Mixture-of-Experts (MoE) methods has alleviated this issue to some extent, it also significantly increases the number of parameters required for fine-tuning and training time, introducing greater parameter redundancy. To address these challenges, we propose a method for allocating expert numbers based on parameter sensitivity LoRA-SMoE (A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning). This method rapidly assesses the sensitivity of different tasks to parameters by sampling a small amount of data and using gradient information. It then adaptively allocates expert numbers within a given budget. The process maintains comparable memory consumption to LoRA (Low-Rank Adaptation) while ensuring an efficient and resource-friendly fine-tuning procedure. Experimental results demonstrate that compared to SOTA fine-tuning methods, our LoRA-SMoE approach can enhance model performance while reducing the number of trainable parameters. This significantly improves model performance in resource-constrained environments. Additionally, due to its efficient parameter sensitivity evaluation mechanism, LoRA-SMoE requires minimal computational overhead to optimize expert allocation, making it particularly suitable for scenarios with limited computational resources. All the code in this study will be made publicly available following the acceptance of the paper for publication. Source code is at https://github.com/EMLS-ICTCAS/LoRA-SMoE
Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations
Kharyuk, Pavel, Matveev, Sergey, Oseledets, Ivan
Drawing parallels with the way biological networks are studied, we adapt the treatment--control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating the internal inference impacted by input data augmentations. The internal changes in network operation are reflected in activation changes measured by variance, which can be decomposed into components related to each augmentation, employing Sobol indices and Shapley values. These quantities enable one to visualize sensitivity to different variables and use them for guided masking of activations. In addition, we introduce a way of single-class sensitivity analysis where the candidates are filtered according to their matching to prediction bias generated by targeted damaging of the activations. Relying on the observed parallels, we assume that the developed framework can potentially be transferred to studying biological neural networks in complex environments.
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
Pandey, Saurabh Kumar, Vashistha, Sachin, Das, Debrup, Aditya, Somak, Choudhury, Monojit
To understand the complexity of sequence classification tasks, Hahn et al. (2021) proposed sensitivity as the number of disjoint subsets of the input sequence that can each be individually changed to change the output. Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB), which provides a scalable approach for calculating word-level local (sentence-level) and global (aggregated) sensitivities concerning an underlying text classifier for any dataset. We establish the effectiveness of our approach through various applications. We perform a case study on CHECKLIST generated sentiment analysis dataset where we show that our algorithm indeed captures intuitively high and low-sensitive words. Through experiments on multiple tasks and languages, we show that sensitivity can serve as a proxy for accuracy in the absence of gold data. Lastly, we show that guiding perturbation prompts using sensitivity values in adversarial example generation improves attack success rate by 15.58%, whereas using sensitivity as an additional reward in adversarial paraphrase generation gives a 12.00% improvement over SOTA approaches. Warning: Contains potentially offensive content.
Simplicity Bias of Transformers to Learn Low Sensitivity Functions
Vasudeva, Bhavya, Fu, Deqing, Zhou, Tianyi, Kau, Elliott, Huang, Youqi, Sharan, Vatsal
Transformers achieve state-of-the-art accuracy and robustness across many tasks, but an understanding of the inductive biases that they have and how those biases are different from other neural network architectures remains elusive. Various neural network architectures such as fully connected networks have been found to have a simplicity bias towards simple functions of the data; one version of this simplicity bias is a spectral bias to learn simple functions in the Fourier space. In this work, we identify the notion of sensitivity of the model to random changes in the input as a notion of simplicity bias which provides a unified metric to explain the simplicity and spectral bias of transformers across different data modalities. We show that transformers have lower sensitivity than alternative architectures, such as LSTMs, MLPs and CNNs, across both vision and language tasks. We also show that low-sensitivity bias correlates with improved robustness; furthermore, it can also be used as an efficient intervention to further improve the robustness of transformers.
Automatized Self-Supervised Learning for Skin Lesion Screening
Useini, Vullnet, Tanadini-Lang, Stephanie, Lohmeyer, Quentin, Meboldt, Mirko, Andratschke, Nicolaus, Braun, Ralph P., Garcรญa, Javier Barranco
The incidence rates of melanoma, the deadliest form of skin cancer, have been increasing steadily worldwide, presenting a significant challenge to dermatologists. Early detection of melanoma is crucial for improving patient survival rates, but identifying suspicious lesions through ugly duckling (UD) screening, the current method used for skin cancer screening, can be challenging and often requires expertise in pigmented lesions. To address these challenges and improve patient outcomes, an artificial intelligence (AI) decision support tool was developed to assist dermatologists in identifying UD from wide-field patient images. The tool uses a state-of-the-art object detection algorithm to identify and extract all skin lesions from patient images, which are then sorted by suspiciousness using a self-supervised AI algorithm. A clinical validation study was conducted to evaluate the tool's performance, which demonstrated an average sensitivity of 93% for the top-10 AI-identified UDs on skin lesions selected by the majority of experts in pigmented skin lesions. The study also found that dermatologists confidence increased, and the average majority agreement with the top-10 AI-identified UDs improved to 100% when assisted by AI. The development of this AI decision support tool aims to address the shortage of specialists, enable at-risk patients to receive faster consultations and understand the impact of AI-assisted screening. The tool's automation can assist dermatologists in identifying suspicious lesions and provide a more objective assessment, reducing subjectivity in the screening process. The future steps for this project include expanding the dataset to include histologically confirmed melanoma cases and increasing the number of participants for clinical validation to strengthen the tool's reliability and adapt it for real-world consultation.
Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging
Arasteh, Soroosh Tayebi, Ziller, Alexander, Kuhl, Christiane, Makowski, Marcus, Nebelung, Sven, Braren, Rickmer, Rueckert, Daniel, Truhn, Daniel, Kaissis, Georgios
Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models for chest radiograph diagnosis regarding accuracy and fairness compared to non-private training. For this, we used a large dataset (N=193,311) of high quality clinical chest radiographs, which were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver-operator-characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. We found that the non-private CNNs achieved an average AUROC score of 0.90 +- 0.04 over all labels, whereas the DP CNNs with a privacy budget of epsilon=7.89 resulted in an AUROC of 0.87 +- 0.04, i.e., a mere 2.6% performance decrease compared to non-private training. Furthermore, we found the privacy-preserving training not to amplify discrimination against age, sex or co-morbidity. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
The YODO algorithm: An efficient computational framework for sensitivity analysis in Bayesian networks
Ballester-Ripoll, Rafael, Leonelli, Manuele
Sensitivity analysis measures the influence of a Bayesian network's parameters on a quantity of interest defined by the network, such as the probability of a variable taking a specific value. Various sensitivity measures have been defined to quantify such influence, most commonly some function of the quantity of interest's partial derivative with respect to the network's conditional probabilities. However, computing these measures in large networks with thousands of parameters can become computationally very expensive. We propose an algorithm combining automatic differentiation and exact inference to efficiently calculate the sensitivity measures in a single pass. It first marginalizes the whole network once, using e.g. variable elimination, and then backpropagates this operation to obtain the gradient with respect to all input parameters. Our method can be used for one-way and multi-way sensitivity analysis and the derivation of admissible regions. Simulation studies highlight the efficiency of our algorithm by scaling it to massive networks with up to 100'000 parameters and investigate the feasibility of generic multi-way analyses. Our routines are also showcased over two medium-sized Bayesian networks: the first modeling the country-risks of a humanitarian crisis, the second studying the relationship between the use of technology and the psychological effects of forced social isolation during the COVID-19 pandemic. An implementation of the methods using the popular machine learning library PyTorch is freely available.
Continual Learning for Peer-to-Peer Federated Learning: A Study on Automated Brain Metastasis Identification
Huang, Yixing, Bert, Christoph, Fischer, Stefan, Schmidt, Manuel, Dรถrfler, Arnd, Maier, Andreas, Fietkau, Rainer, Putz, Florian
Due to data privacy constraints, data sharing among multiple centers is restricted. Continual learning, as one approach to peer-to-peer federated learning, can promote multicenter collaboration on deep learning algorithm development by sharing intermediate models instead of training data. This work aims to investigate the feasibility of continual learning for multicenter collaboration on an exemplary application of brain metastasis identification using DeepMedic. 920 T1 MRI contrast enhanced volumes are split to simulate multicenter collaboration scenarios. A continual learning algorithm, synaptic intelligence (SI), is applied to preserve important model weights for training one center after another. In a bilateral collaboration scenario, continual learning with SI achieves a sensitivity of 0.917, and naive continual learning without SI achieves a sensitivity of 0.906, while two models trained on internal data solely without continual learning achieve sensitivity of 0.853 and 0.831 only. In a seven-center multilateral collaboration scenario, the models trained on internal datasets (100 volumes each center) without continual learning obtain a mean sensitivity value of 0.699. With single-visit continual learning (i.e., the shared model visits each center only once during training), the sensitivity is improved to 0.788 and 0.849 without SI and with SI, respectively. With iterative continual learning (i.e., the shared model revisits each center multiple times during training), the sensitivity is further improved to 0.914, which is identical to the sensitivity using mixed data for training. Our experiments demonstrate that continual learning can improve brain metastasis identification performance for centers with limited data. This study demonstrates the feasibility of applying continual learning for peer-to-peer federated learning in multicenter collaboration.
Active Learning-Based Optimization of Scientific Experimental Design
Active learning (AL) is a machine learning algorithm that can achieve greater accuracy with fewer labeled training instances, for having the ability to ask oracles to label the most valuable unlabeled data chosen iteratively and heuristically by query strategies. Scientific experiments nowadays, though becoming increasingly automated, are still suffering from human involvement in the designing process and the exhaustive search in the experimental space. This article performs a retrospective study on a drug response dataset using the proposed AL scheme comprised of the matrix factorization method of alternating least square (ALS) and deep neural networks (DNN). This article also proposes an AL query strategy based on expected loss minimization. As a result, the retrospective study demonstrates that scientific experimental design, instead of being manually set, can be optimized by AL, and the proposed query strategy ELM sampling shows better experimental performance than other ones such as random sampling and uncertainty sampling.
An In-Vehicle KWS System with Multi-Source Fusion for Vehicle Applications
Tan, Yue, Zheng, Kan, Lei, Lei
Abstract--In order to maximize detection precision rate as well as the recall rate, this paper proposes an in-vehicle multisource fusionscheme in Keyword Spotting (KWS) System for vehicle applications. Vehicle information, as a new source for the original system, is collected by an in-vehicle data acquisition platform while the user is driving. A Deep Neural Network (DNN) is trained to extract acoustic features and make a speech classification. Based on the posterior probabilities obtained from DNN, the vehicle information including the speed and direction of vehicle is applied to choose the suitable parameter from a pair of sensitivity values for the KWS system. The experimental results show that the KWS system with the proposed multi-source fusion scheme can achieve better performances in term of precision rate, recall rate, and mean square error compared to the system without it. I. INTRODUCTION Keyword Spotting (KWS) System, also known as wakeword detection,refers to the task of detecting specified keyword from a continuous stream of audio provided by the users [1]. Keyword Spotting has been an active research area in speech recognition for decades, and widely used in numerous applications.